feat(speclib): Skyline transition list CSV support + speclib_build_cli bundle#55
Merged
feat(speclib): Skyline transition list CSV support + speclib_build_cli bundle#55
Conversation
…r msgpack.zst Add pub constructors (new()) to PrecursorEntry, ReferenceEG, and SerSpeclibElement so external crates can construct speclib entries. Make PrecursorEntry pub. Add SpeclibWriter<W> wrapping zstd::Encoder for streaming msgpack.zst output. Re-export all four types from data_sources. Two roundtrip tests cover both rmp_serde encode/decode and the full writer→reader pipeline.
- clap CLI with all args + TOML config merge (CLI > TOML > defaults) - Proforma mod parser: fixed + variable mod application - Bloom + bucket HashMap peptide dedup - REVERSE + EDGE_MUTATE decoy strategies - Example config at repo root
- lib.rs re-exports for integration test access - 3 integration tests: digestion+dedup, mod chain, decoy roundtrip - Taskfile: speclib:build, speclib:local-koina, speclib:stop-koina
- Prosit expects bare AA sequences, strip bracket mods before sending - Convert Koina annotation format (y1+1) to mzPAF format (y1^1) - Make strip_mods pub for pipeline use
…warn affected proteins
- Precursor m/z now computed from modified sequence (includes +57 for Cys carbam) - Koina receives modified sequences with [UNIMOD:N] notation (Prosit handles them) - to_proforma converts [U:N] → [UNIMOD:N] (rustyms parses [U:N] as element Uranium) - Isotope distribution also uses modified formula
- Add Prosit_2023_intensity_timsTOF and Prosit_2020_intensity_CID to model registry - Default fragment model now timsTOF-specific (was HCD) - Same Triton v2 schema, drop-in compatible
- Tolerance::default() now uses RtTolerance::Unrestricted (was Minutes(5,5))
- Prescore extracts full RT range, calibration maps library→observed RT
- Bench config drops explicit RT tolerance (defaults to unrestricted)
- Users with calibrated libraries can still set "rt": {"minutes": [N, N]} in config
New skyline_io module parses Skyline Peptide Transition List exports as a spectral library source, mirroring the DIA-NN/Spectronaut pattern (sniff + group-by-precursor + emit TimsElutionGroup<IonAnnot> plus SkylinePrecursorExtras). Wires through timsquery's dispatch, timsquery_viewer's extras view, and timsseek's speclib converter (reusing convert_diann_to_query_item via a small extras adapter). Notable behaviors: - Precursor-isotope rows (Fragment Ion Type == "precursor") are skipped; downstream computes the envelope from the sequence. - `#N/A` Library Intensity values default to 1.0. - RT/IM columns are absent in Skyline exports; defaults to 0.0 with a loud warning (use Unrestricted RT tolerance). - Modifications in `[...]` are stripped for the stripped_peptide field; modified_peptide is preserved verbatim.
…d_cli Drops the Python speclib_builder package and all references (root pyproject workspace/deps, hatch packages, uv sources, uv.lock). The Rust speclib_build_cli now owns the digest → dedup → expand → Koina predict → write pipeline end-to-end.
No notebooks in the repo and no other references to jupyter. Prunes ~100 transitive deps from uv.lock.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skyline_iomodule mirrors the DIA-NN/Spectronaut pattern (sniff + group-by-precursor →TimsElutionGroup<IonAnnot>+SkylinePrecursorExtras), wired through timsquery dispatch, timsquery_viewer extras, and timsseek speclib converter.main: thespeclib_build_cliRust CLI (digest → dedup → expand → Koina predict → write msgpack.zst), Prosit_2023_intensity_timsTOF default, iRT-library RT tolerance default (Unrestricted), and related fixes.Skyline specifics
Fragment Ion Type == "precursor") skipped; envelope computed from sequence downstream.#N/ALibrary Intensity → default 1.0.UnrestrictedRT tolerance).[...]stripped forstripped_peptide;modified_peptidepreserved verbatim.Test plan
cargo test -p timsquery --lib serde::— 19 tests pass (6 new skyline_io tests)cargo test -p timsseek --lib speclib— 12 tests pass (test_load_skyline_csv_librarynew)cargo build -p timsquery -p timsseek -p timsquery_viewerclean